Statistical Machine Translation: Rapid Development with Limited Resources

نویسندگان

  • George Foster
  • Simona Gandrabur
  • Philippe Langlais
  • Pierre Plamondon
  • Graham Russell
  • Michel Simard
چکیده

We describe an experiment in rapid development of a statistical machine translation (SMT) system from scratch, using limited resources: under this heading we include not only training data, but also computing power, linguistic knowledge, programming effort, and absolute time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Temple Translator's Workstation Project

The Temple project has developed an open multi.lingual architecture and software support for rapid development of extensible Machine Translation functionalities. The targeted languages are those for which Natural Language Processing and human resources are scarce or difficult to obtain. The goal is to support rapid development of machine translation functionalities in a very short time with lim...

متن کامل

The Temple Web Translator

New Web sites in foreign languages are appearing everyday, and language barriers threaten to atomize the World Wide Web into closed linguistic communities. The Temple project has developed an open multilingual architecture and software support for rapid development of machine translation systems for assimilation purposes. The targeted languages are those for which natural language processing an...

متن کامل

Resource Report: Building Parallel Text Corpora for Multi-Domain Translation System

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. However, manual translations are very costly, and the number of known parallel text is limited. Hence, our research started with creating and collecting a large amount of parallel text resources for Indonesian-English. We describe in this paper the creation ...

متن کامل

Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources

This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined tech...

متن کامل

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003